2  Calculating neighborhood densities

2.1 Overview

This tutorial follows from Section 2. “Modeling considerations when estimating CDD” of the main text:

Section 2. “Modeling considerations when estimating CDD”

For survival and growth models, defining the density metric and the size of local neighborhoods also require consideration. The density of conspecific and heterospecific neighbors surrounding a focal individual (or plot) can be measured within different distances (i.e., radii). Within a radius, neighbors should be weighted in the density summation with regard to their size and distance to the focal individuals, with the rationale that neighbors at a greater distance and of smaller size have smaller effects. Typically, weighting involves dividing the basal area or diameter by a linear function or an exponential function of distance to allow competitive effects of an individual of a given size to decline with distance (Uriarte et al. 2010; Canham et al. 2006). Biologically meaningful radii should be tested based on the size/life stage, vital rate of interest, and the likely agents of density dependence in the system. For a system of interest, we suggest comparing models with multiple distances (e.g., using maximum likelihood or information criterion) because forests may differ in how neighbors influence performance.

Here, we demonstrate how to calculate the density of conspecific, heterospecific, and all neighbors surrounding a focal individual or plot for a given distance when XY coordinates are known. As a result, this section requires that the user’s dataset contains the location of mapped stems. If locations are not known (as is common in seedling studies or smaller plots), continue to part 2 of this appendix.

We then demonstrate how to weight the calculation of neighborhood density by individual neighbor size (i.e., basal area) and distance using an exponential decay function (one of many possible decay functions), allowing the competitive effects of density values to saturate with increasing distances (Uriarte et al. 2010; Canham et al. 2006).

To assess which shape parameters of the exponential decay function are most appropriate for the data set, we fit models with multiple combinations of decay function values and compare models using log likelihood.

From a computational perspective, this approach can be relatively resource intensive both in terms of time and object size. It’s possible to make this approach more efficient by subdividing the data (e.g., by plot) or using more efficient data structures such as data.table.

We also note that alternative approaches allow the estimation of the effective scale of neighborhood interactions directly from data (see Barber et al. 2022). An excellent case study using the Stan programming language is available here.

Note

The following code is heavily adapted from the latitudinalCNDD repository by Lisa Hülsmann from the publication Hülsmann et al. (2024).

2.2 Load libraries and data

Code
# Load libraries
library(tidyr) # For data manipulation
library(dplyr) # For data manipulation
library(ggplot2) # For data plotting
library(spatstat.geom) # For spatial analysis
library(here) # For managing working directories
library(mgcv) # For fitting gams
library(lubridate) # For calculating census intervals
library(broom) # For processing fit models
library(purrr) # For data manipulation
library(kableExtra) # For table styling

2.3 Data format explanation

For this tutorial, we will be using a small example subset of data from Barro Colorado Island (BCI), Panama (available here) that includes 7,028 observations, of 3,771 individuals of 16 tree species across two censuses from the larger 50 ha BCI Forest Dynamics Plot data set (more information here). Each stem is mapped and its diameter at 1.3m above ground is measured (DBH), which allows us to calculate neighborhood densities across different distances, and as a function of neighbor size (e.g., basal area), respectively.

The code below assumes the data is in a format where each row is an observation for an individual from a census. For this data set, the column descriptions are as follows:

  • treeID: unique identifier for each tree

  • sp: species code

  • gx: spatial coordinate on x axis

  • gy: spatial coordinate on y axis

  • dbh: diameter at breast height (mm)

  • ba: basal area (m^2)

  • status: status at census, A = alive, D = dead

  • date: date of observation

  • census: census number

  • mort: mortality status at census, 1 = dead, 0 = alive

  • mort_next: mortality status at next census, 1 = dead, 0 = alive

  • interval: time interval between censuses in years

Let’s take a quick look at the data set we’ll be working with:

Code
head(dat, n = 5)
# A tibble: 5 × 12
  treeID sp        gx    gy   dbh     ba status date        mort mort_next
  <chr>  <chr>  <dbl> <dbl> <dbl>  <dbl> <chr>  <date>     <dbl>     <dbl>
1 3884   ceibpe  296.  24.8  1500 1.77   A      1981-11-10     0         0
2 3998   cordbi  280.  45.4   281 0.0620 A      1981-12-11     0         0
3 4065   ceibpe  289. 266.   2000 3.14   A      1982-01-08     0         0
4 4070   cordbi  283. 287.    356 0.0995 A      1982-01-08     0         0
5 4119   ceibpe  258. 224.   1600 2.01   A      1981-12-13     0         0
# ℹ 2 more variables: interval <dbl>, census <dbl>

We can produce a plot Figure 2.1 of the tree locations, where the size of the point is scaled to basal area and colored by species, with each census displayed as a panel:

Code
ggplot(dat, aes(x = gx, y = gy, size = ba, col = sp)) + 
  geom_point() + 
  facet_wrap(~census) +
  theme_bw(10) + 
  theme(legend.position = "right") + 
  coord_fixed(ratio = 1) + 
  labs(size = "Basal area", col = "Species")

Figure 2.1: Map of tree locations by census. Points are individual stems scaled by basal area, and colors represent the six-letter identifiers of 16 common species at the BCI forest plot.

2.4 Define an exponential decay function

In our neighborhood, we model neighborhood densities using an exponential decay function. In principle, it’s possible to use various different decay functions to explore various ranges that determine the rate of decay.

Here, we present the general framework for how one might explore different scenarios, though the ultimate choice of decay function and parameter values will depend on the study and is beyond the scope of this Appendix. Note that this method of calculating neighborhood density is only possible when neighbors’ x-y coordinates are known.

First, we write a function in R for exponential decay with distance, where mu controls the shape of the curve:

Code
exponential_decay <- function(mu, distance){
  return(exp(-(1/mu * distance)))
}

Let’s see what the exponential decay function looks like across a range of mu values (Figure 2.2):

Code
# Set range of mu values and distances
decay_values <- seq(from = 1, to = 25, by = 2)

# Use sprintf to add leading 0s, will help with sorting later on
decay_names = paste("exp", sprintf("%02s", decay_values), sep = "") 

distances <- seq(1, 100, 1)

# Generate a dataframe with each combination of mu and distance
example_decay_values <- expand_grid(decay_values, distances) %>%
                        # Rename columns
                        rename(decay_value = decay_values, 
                               distance = distances)

# Evaluate distance decay function for each combination of 
# mu and distance
example_decay_values$decay <- exponential_decay(
                              mu = example_decay_values$decay_value,
                              distance = example_decay_values$distance)

# Plot results
ggplot(example_decay_values, 
       aes(x = distance, y = decay, 
           color = decay_value, group = decay_value)) + 
  ylab("Density") + 
  xlab("Distance (m)") + 
  scale_color_continuous(name = "Value of \ndecay constant") + 
  geom_line() + 
  theme_bw(12)

Figure 2.2: Plot of decay function

2.4.1 Determine which trees are at edge of plot

Trees near the edge of plot boundaries have incomplete information about their neighborhood, because neighboring trees outside of the plot boundaries are not mapped, and we do not advise using boundary corrections to include those stems.Typically trees within certain distance threshold from the plot edge are excluded from analysis, but still included in calculations of neighborhood densities.

We are going to add a column to our data set called ‘edge’ that is TRUE if within a set distance to the edge of the plot (e.g., 30 m in the example below).

For our example data set, the dimensions of the plot are 300 x 300 m, ranging from 0-300 on both the x and y axis, and representing a subset of the overall 50 ha forest dynamics plot at BCI.

Code
# Set threshold for distance to edge
distance_threshold_edge = 30

# Add in min and max values for corners of plot
min_x <- 0
max_x <- 300
min_y <- 0
max_y <- 300

dat$edge = dat$gx < min_x + distance_threshold_edge |
           dat$gx > max_x - distance_threshold_edge |
           dat$gy < min_y + distance_threshold_edge |
           dat$gy > max_y - distance_threshold_edge

# How many trees fall within the edge threshold?
table(dat$edge)

FALSE  TRUE 
 4526  2502 

Below is a plot Figure 2.3 of tree locations colored by whether they fall within the edge threshold or not, separated out for each census.

Code
ggplot(dat, aes(x = gx, y = gy, col = edge)) + 
  geom_point() + 
  facet_wrap(~census) +
  coord_fixed(ratio = 1) + 
  theme_bw(12)

Figure 2.3: Map of tree locations colored by whether they fall within the edge threshold

2.5 Calculate distances among focal individual and neighbors

Next, we will calculate distances among individuals in our plot, using an upper threshold distance of what to consider a neighbor.

We use the ‘spatstat.geom’ package to efficiently calculate distances among individuals.

Because this example only includes one census interval (two censuses total), we will subset down to just the first census to calculate neighborhood density.

If the data set were to contain multiple census intervals, it would be necessary to calculate neighborhood density separately for each individual in each census interval, using only the individuals that were alive at the beginning of that census interval, or using an average density value between two consecutive censuses.

Code
# Set distance threshold for considering neighbors
distance_threshold_neighborhood <- 30

# Subset to first census - this will be different for different 
# datasets
dat_first_census <- dat %>%
                    filter(census == 1)

# Format into 'ppp' object
dat_ppp = spatstat.geom::ppp(dat_first_census$gx, dat_first_census$gy, 
                             window = owin(range(dat$gx), 
                                           range(dat$gy)), 
                                           checkdup = F)

# Determine close pairs based on distance threshold
# Returns a list that we convert to tibble later
neighbors = spatstat.geom::closepairs(dat_ppp, 
              rmax = distance_threshold_neighborhood, # Max radius
              what = "ijd", # return indicies i, j, and distance 
              twice = TRUE)

# Convert to dataframe
neighbors <- as_tibble(neighbors)

# Take a peek at the data
# i = index of focal individual
# j = index of neighbor
# d = distance to neighbor
head(neighbors)
# A tibble: 6 × 3
      i     j     d
  <int> <int> <dbl>
1  3227  3252 27.0 
2  3227  3238 17.1 
3  3227  3240  5.75
4  3227  3229 18.6 
5  3227  3253 24.4 
6  3227  3222 20.8 

Next we add in additional columns for individual characteristic, (e.g., species, size) and whether they are located near the edge of the plot (blue area in Figure 2.3).

Code
# add additional columns

# Add species for individual i
neighbors$sp_i = dat_first_census$sp[neighbors$i] 

# Add whether individual i is near edge
neighbors$edge_i = dat_first_census$edge[neighbors$i] 

# Add species for individual j
neighbors$sp_j = dat_first_census$sp[neighbors$j] 

# Add basal area of individual j
neighbors$ba_j = dat_first_census$ba[neighbors$j] 

We then want to add a column that indicates whether the comparison between the focal individual and the neighbor is conspecific or heterospecific because we are interested separately estimating the densities of conspecifics and heterospecifics.

Code
neighbors$comparison_type <- ifelse(neighbors$sp_i == neighbors$sp_j,
                                    yes = "con", # conspecific
                                    no = "het") # heterospecific

We then remove focal trees that are too close to the edge of the plot.

Code
# remove focal trees i that are at the edge
neighbors = neighbors[!neighbors$edge_i, ]

Next, we add columns to our neighbors data set that indicates the distance decay multiplier and the distance decay multiplier weighted by basal area.

Code
# Loop through distance decay values
for(x in 1:length(decay_values)){
  
  #  Add column for distance decay multiplier for each decay value
  # add _ba suffix to column name - will eventually be summed based on
  # number of individual neighbors
  neighbors[, paste0(decay_names[x], "_N")] <- exponential_decay(mu = decay_values[x], 
                                                   distance = neighbors$d)
  
  # Weight basal area by distance decay multiplier
  # add _ba suffix to column name
  neighbors[, paste0(decay_names[x], "_BA")] <- exponential_decay(
                                                mu = decay_values[x], 
                             distance = neighbors$d) * neighbors$ba_j
}

Depending on how many distance decay values are being investigated, there may be many columns in the data frame.

Code
head(neighbors)
# A tibble: 6 × 34
      i     j     d sp_i  edge_i sp_j     ba_j comparison_type  exp01_N exp01_BA
  <int> <int> <dbl> <chr> <lgl>  <chr>   <dbl> <chr>              <dbl>    <dbl>
1  2973  2993 14.3  cord… FALSE  capp… 1.57e-4 het             6.36e- 7 9.98e-11
2  2973  3122 20.4  cord… FALSE  des2… 4.91e-4 het             1.40e- 9 6.86e-13
3  2973  2974  5.86 cord… FALSE  des2… 7.85e-5 het             2.85e- 3 2.24e- 7
4  2973  3123 27.2  cord… FALSE  des2… 7.85e-5 het             1.47e-12 1.16e-16
5  2973  3121 14.5  cord… FALSE  mico… 1.77e-4 het             5.27e- 7 9.32e-11
6  2973  2936 13.4  cord… FALSE  des2… 7.85e-5 het             1.45e- 6 1.14e-10
# ℹ 24 more variables: exp03_N <dbl>, exp03_BA <dbl>, exp05_N <dbl>,
#   exp05_BA <dbl>, exp07_N <dbl>, exp07_BA <dbl>, exp09_N <dbl>,
#   exp09_BA <dbl>, exp11_N <dbl>, exp11_BA <dbl>, exp13_N <dbl>,
#   exp13_BA <dbl>, exp15_N <dbl>, exp15_BA <dbl>, exp17_N <dbl>,
#   exp17_BA <dbl>, exp19_N <dbl>, exp19_BA <dbl>, exp21_N <dbl>,
#   exp21_BA <dbl>, exp23_N <dbl>, exp23_BA <dbl>, exp25_N <dbl>,
#   exp25_BA <dbl>

2.6 Calculate neighborhood density

Then we summarize the neighborhood density for each focal tree separately for conspecifics and heterospecifics.

Code
# Simple calculations of number of neighbors and total basal area, 
# ignoring distance decay multiplier
neighbors_summary <- neighbors %>%
                     group_by(i, comparison_type) %>%
                     summarise(nodecay_N = n(), # count of neighbors
                          nodecay_BA = sum(ba_j)) # sum of basal area)

# Add in decay columns
neighbors_summary_decay <- neighbors %>%
                            group_by(i, comparison_type) %>%
                      # Select only columns related to distance decay
                            select(starts_with("exp")) %>% 
                      # Summarize them all by summing columns
                            summarise_all(sum) 

# Join both together
neighbors_summary <- left_join(neighbors_summary, 
                               neighbors_summary_decay,
                               by = c("i", "comparison_type"))

# Add treeID column
neighbors_summary$treeID<-dat_first_census$treeID[neighbors_summary$i]


   
# If there are any focal individuals with no neighbors, add values 
# of 0 for neighborhood densities 
      noNeighbors = dat_first_census$treeID[!dat_first_census$treeID 
                                      %in% neighbors_summary$treeID &
                                          !dat_first_census$edge]
      
      # If there are individuals with no neighbors
      if (length(noNeighbors) > 0) {
        neighbors_summary = bind_rows(neighbors_summary, 
                                      expand_grid(i = NA, 
                                             treeID = noNeighbors, 
                             comparison_type = c("het", "cons"))) %>%
                            # Add 0s where NA
                            mutate_all(replace_na, replace = 0) 
        }

# Take a peak at the data
head(neighbors_summary)
# A tibble: 6 × 31
# Groups:   i [6]
      i comparison_type nodecay_N nodecay_BA exp01_N   exp01_BA exp03_N exp03_BA
  <int> <chr>               <int>      <dbl>   <dbl>      <dbl>   <dbl>    <dbl>
1     5 het                   115     0.487  0.0991     4.05e-5   2.03  0.00172 
2     8 het                    94     0.0539 0.181      3.63e-5   1.21  0.000703
3     9 het                   151     2.17   0.0464     1.67e-4   1.65  0.00287 
4    10 het                    76     0.0509 0.00131    4.72e-7   0.464 0.000175
5    11 het                    59     0.0631 0.00859    1.99e-6   0.493 0.000277
6    12 het                    93     0.0506 0.0441     9.51e-6   1.57  0.000745
# ℹ 23 more variables: exp05_N <dbl>, exp05_BA <dbl>, exp07_N <dbl>,
#   exp07_BA <dbl>, exp09_N <dbl>, exp09_BA <dbl>, exp11_N <dbl>,
#   exp11_BA <dbl>, exp13_N <dbl>, exp13_BA <dbl>, exp15_N <dbl>,
#   exp15_BA <dbl>, exp17_N <dbl>, exp17_BA <dbl>, exp19_N <dbl>,
#   exp19_BA <dbl>, exp21_N <dbl>, exp21_BA <dbl>, exp23_N <dbl>,
#   exp23_BA <dbl>, exp25_N <dbl>, exp25_BA <dbl>, treeID <chr>

As described in the main text, it can be advantageous to use total density that includes combined conspecific and heterospecific densities as a covariate, rather than heterospecific density.

Here, we calculate total density by summing heterospecific and conspecific densities.

Code
# First convert to long format which will make it easy to sum across 
# heterospecific and conspecific values
neighbors_summary_long_format <-  neighbors_summary %>%
                                  pivot_longer(cols = -c("i", "comparison_type", "treeID"))

# Sum across heterospecific and conspecific values and rename to 'total'
neighbors_total_long_format <- neighbors_summary_long_format %>%
                                group_by(i, treeID, name) %>%
                                summarize(value = sum(value)) %>%
                                mutate(comparison_type = "total")

# Bind together conspecific and 'total' densities
# remove heterospecific columns
# fill in 0s where there are no neighbors
neighbors_summary_total = bind_rows(neighbors_summary_long_format,
                                    neighbors_total_long_format) %>%
                        # Can filter out heterospecific neighborhood 
                        # values by uncommenting this line of the
                        # objects become too large
                        #  filter(comparison_type != "het") %>% 
                mutate(name = paste0(comparison_type, "_", name)) %>%
                select(-comparison_type) %>%
                pivot_wider(names_from = name, values_from = value, 
                            values_fill = 0)

2.7 Model mortality as a function of neighborhood density

To determine the ‘best’ decay parameter to use, we fit species-specific mortality models using Generalized Additive Models GAMs and compare models based on log likelihoods. GAMs are flexible statistical models that combine linear and non-linear components to capture complex relationships in data.

We first create our data set that we will use in the GAMs, subsetting down to just one census interval (because our example dataset only has 2 censuses) and removing trees close to the edge as above. In other datasets, you may have multiple census intervals, where it would be common practice to include ‘year’ or ‘census’ as a random effect in the model, but otherwise the overall approach is similar.

Code
# Join census data with neighborhood data
dat_gam <- left_join(dat_first_census,
                               neighbors_summary_total,
                               by = "treeID")

# Remove edge trees (these will have NAs for densities calcualations)
dat_gam <- dat_gam %>%
                    filter(edge == FALSE)

For each species, we summarize data availability to help determine parameters for the GAM smooth terms.

Code
# Summarize data availability at species level to set the degree of 
# smoothness for GAM smooth terms
sp_summary <- dat_gam %>% 
  group_by(sp) %>% 
  summarise(ndead = sum(mort_next),
            range_con_BA = max(con_nodecay_BA) - min(con_nodecay_BA),
            max_con_BA = max(con_nodecay_BA),
            unique_con_BA = length(unique(con_nodecay_BA)),
            unique_total_BA = length(unique(total_nodecay_BA)),
            range_con_N = max(con_nodecay_N) - min(con_nodecay_N),
            max_con_N = max(con_nodecay_N),
            unique_con_N = length(unique(con_nodecay_N)),
            unique_total_N = length(unique(total_nodecay_N)),
            unique_dbh = length(unique(dbh))
  )

# Filter out species that have 0% mortality
sp_summary <- sp_summary %>%
  filter(ndead > 0)

In this long block of code, we loop over all possible combinations of decay values for different metrics of neighborhood densities weighted by distance for each species and fit a separate GAM for each model. For each GAM, we assess whether the model was able to be fit, and when it was able to be fit, whether it converged or produced warnings. We save the results of successful model fits into a list that we will process later.

For large datasets where individual GAMs take a long time to run, the code could be modified to run in parallel, either locally on a personal computer or across a computing cluster.

Code
# Initialize list that will save model outputs
res_mod <- list()


# Model run settings
run_settings <- expand_grid(species = unique(sp_summary$sp),
                            decay_con = c("nodecay", decay_names),
                            decay_total = c("nodecay", decay_names),
                            nhood_data_type = c("N", "BA"))


# Loop through model run settings
for(run_settings_row in 1:nrow(run_settings)){
  
  # Extract values from run settings dataframe
  species <- run_settings$species[run_settings_row]
  decay_con <- run_settings$decay_con[run_settings_row]
  decay_total <- run_settings$decay_total[run_settings_row]
  nhood_data_type <- run_settings$nhood_data_type[run_settings_row]

  # Subset down to just focal species
  dat_subset <- dat_gam %>%
    filter(sp == species)

  # Set run name
  run_name <- paste0(species, "_total", decay_total,"_con", 
                     decay_con, "_", nhood_data_type)
  
  # Print status if desired
  # cat("Working on run: ", run_name, " ...\n")

  # Create model formula
  # Initial DBH included as predictor variable
  form =  paste0("mort_next ~ s(dbh, k = k1) + s(total_", decay_total, 
                              "_", nhood_data_type, 
                              ", k = k2)  + s(con_", decay_con, 
                              "_", nhood_data_type, 
                              ", k = k3)")
    
  # Convert into formula
  form <- as.formula(form)
    
    # Choose penalties for model fitting
    # set to default 10 (the same as -1)
    # The higher the value of k, the more flexible the smooth term becomes, allowing for more intricate and wiggly patterns. Conversely, lower values of k result in smoother and simpler representations.
    k1 = k2 = k3 = 10
    if (k1 > sp_summary$unique_dbh[sp_summary$sp == species]) { 
      k1 = sp_summary$unique_dbh[sp_summary$sp == species] - 2
    }
    if (k2 > sp_summary$unique_total_N[sp_summary$sp == species]) {
      k2 = sp_summary$unique_total_N [sp_summary$sp == species]- 2
    }
    if (k3 > sp_summary$unique_con_N[sp_summary$sp == species]) { 
      k3 = sp_summary$unique_con_N[sp_summary$sp == species] - 2 
    }
    

   # Fit model
   # wrap in a try function to catch any errors
   mod = try(gam(form,
            family = binomial(link=cloglog),
            offset = log(interval),
            data = dat_subset,
            method = "REML"), 
          silent = T)

   
    # Check if model was able to fit
    if (!any(class(mod) == "gam")) {
      # print(paste("gam failed for:", run_name))
    } else {

    # Check if gam converged
    if (!mod$converged) {
      # print(paste("no convergence for:", run_name))
    } else {
  
      # check for complete separation
      # https://stats.stackexchange.com/questions/336424/issue-with-complete-separation-in-logistic-regression-in-r
      # Explore warning "glm.fit: fitted probabilities numerically 0 
      # or 1 occurred"
      eps <- 10 * .Machine$double.eps
      glm0.resids <- augment(x = mod) %>%
        mutate(p = 1 / (1 + exp(-.fitted)),
               warning = p > 1-eps,
               influence = order(.hat, decreasing = T))
      infl_limit = round(nrow(glm0.resids)/10, 0)
      # check if none of the warnings is among the 10% most 
      # influential observations, than it is okay..
      num = any(glm0.resids$warning & glm0.resids$influence < infl_limit)
      
      # complete separation
      if (num) {
       # print(paste("complete separation is likely for:", run_name))
      } else {
        
        # missing Vc
        if (is.null(mod$Vc)) {
         # print(paste("Vc not available for:", run_name))
        } else {
        
          # Add resulting model to list if it passes all checks
          res_mod[[run_name]] <- mod
          
        } # Vc ifelse
      } # complete separation ifelse
    } # convergence ifelse
  } # model available ifelse
} # end run settings loop

2.8 Summarize model fits

Next we will extract summaries for each model with broom::glance() that provides key information like degrees of freedom, log likelihood, AIC, etc.

Code
# Extract summaries for each model into a list
sums = lapply(res_mod, broom::glance) 

# Add a column for model run to each object in the list
sums = Map(cbind, sums, run_name = names(sums)) 
sums = do.call(rbind, sums) # Bind elements of list together by rows
rownames(sums) <- NULL # Remove row names

# Separate run name into columns for species, decay, and density type
sums <- sums %>%
        separate(run_name, into = c("sp", "decay_total", 
                                    "decay_con", "density_type"), 
                 remove = FALSE)

# Remove 'total' and 'con' from decay columns
sums$decay_total <- gsub("total", "", sums$decay_total)
sums$decay_con <- gsub("con", "", sums$decay_con)

# Rearrange columns  
sums <- sums %>%
          select(run_name, sp, decay_total, decay_con, density_type, 
                 everything()) 

# Take a look at the model summaries
head(sums)
                       run_name     sp decay_total decay_con density_type

1 cappfr_totalnodecay_connodecay_N cappfr nodecay nodecay N 2 cappfr_totalnodecay_connodecay_BA cappfr nodecay nodecay BA 3 cappfr_totalexp01_connodecay_N cappfr exp01 nodecay N 4 cappfr_totalexp01_connodecay_BA cappfr exp01 nodecay BA 5 cappfr_totalexp03_connodecay_N cappfr exp03 nodecay N 6 cappfr_totalexp03_connodecay_BA cappfr exp03 nodecay BA df logLik AIC BIC deviance df.residual nobs adj.r.squared 1 4.231224 -22.61561 54.10008 70.37389 45.23123 285.7688 290 0.001949120 2 7.360197 -17.20583 51.02825 81.51869 34.41167 282.6398 290 0.048186759 3 5.063787 -22.64430 56.52330 77.13828 45.28861 284.9362 290 -0.000915769 4 8.081624 -17.53607 53.89847 88.44366 35.07214 281.9184 290 0.134401283 5 8.071055 -17.86595 54.97936 90.29731 35.73189 281.9289 290 0.079728079 6 7.926840 -17.62220 53.84700 87.98169 35.24439 282.0732 290 0.133467764 npar 1 28 2 28 3 28 4 28 5 28 6 28

Due to limited sample sizes, it is likely that GAMs will not fit for each species. For example, in the table below of summary data for species where models did not successfully fit/converge, there are several instances where all individuals of the species survived during the census, leading to no mortality events to use in the model.

We need to exclude species without complete model runs from our overall calculations when looking for optimal decay parameters across the entire data set.

Code
# Tally up number of model runs by species and total decay values
table(sums$sp, sums$decay_total)
        
         exp01 exp03 exp05 exp07 exp09 exp11 exp13 exp15 exp17 exp19 exp21
  cappfr    28    28    28    28    28    28    28    28    28    28    28
  cordbi    27    28    28    28    28    28    28    28    28    28    27
  des2pa    28    28    28    28    28    28    28    28    28    28    28
  micone    28    28    28    28    28    28    28    28    28    28    28
  pentma    28    28    28    28    28    23    27    28    28    28    28
  stylst    28    28    28    28    28    28    28    28    28    28    28
        
         exp23 exp25 nodecay
  cappfr    28    28      28
  cordbi    28    28      28
  des2pa    28    28      28
  micone    28    28      28
  pentma    28    28      28
  stylst    28    28      28
Code
# get incomplete run-site-species combinations
run_counts_by_sp <- sums %>% 
                    group_by(sp) %>%
                    tally() %>%
                    # Join with overall species list
                    left_join(sp_summary %>% select(sp), ., by = "sp") 

# Get expected number of runs if all models work
expected_total_runs <- run_settings %>%
                       group_by(species) %>%
                       tally() %>%
                       pull(n) %>%
                       max()

# Save species names where they didn't have all expected combinations 
# of model runs
incomplete = run_counts_by_sp$sp[run_counts_by_sp$n < 
                                   expected_total_runs | 
                                   is.na(run_counts_by_sp$n)]

# Species with successful runs
head(sp_summary[!sp_summary$sp %in% incomplete, ])
# A tibble: 4 × 11
  sp     ndead range_con_BA max_con_BA unique_con_BA unique_total_BA range_con_N
  <chr>  <dbl>        <dbl>      <dbl>         <int>           <int>       <dbl>
1 cappfr     5      0.0183     0.0190            248             290          30
2 des2pa   174      0.144      0.149            1318            1341         123
3 micone    38      0.00293    0.00293            49              66          12
4 stylst     3      0.0165     0.0165             92             104          13
# ℹ 4 more variables: max_con_N <dbl>, unique_con_N <int>,
#   unique_total_N <int>, unique_dbh <int>
Code
# Species without successful runs
head(sp_summary[sp_summary$sp %in% incomplete, ])
# A tibble: 4 × 11
  sp     ndead range_con_BA max_con_BA unique_con_BA unique_total_BA range_con_N
  <chr>  <dbl>        <dbl>      <dbl>         <int>           <int>       <dbl>
1 acalma     2      0.00446    0.00446             4               7           2
2 cordbi     7      0.249      0.249              36              43          10
3 pentma    12      0.00363    0.00363            20              33           6
4 sponra     3      0.0760     0.0760             12              18           5
# ℹ 4 more variables: max_con_N <dbl>, unique_con_N <int>,
#   unique_total_N <int>, unique_dbh <int>

2.9 Selecting optimum decay parameter values across all species

We then summarize different model criteria across all species runs. To look for the optimal value for decay parameters, we sum log likelihoods across all species for a given decay parameter combination and choose the resulting parameter combination with the highest summed log likelihood.

One crucial point is that each run of the grid search should be done with the same set of trees, even if smaller neighborhodoes not need a buffer of 30 m.

Code
sums_total <- sums %>% 
              filter(!sp %in% incomplete) %>%
              group_by(decay_total, decay_con, density_type) %>%
              summarise(nvalues = n(),
                        sumlogLik = sum(logLik),
                        meanlogLik = mean(logLik)) %>%
              arrange(decay_total, decay_con, density_type)

sums_total
# A tibble: 392 × 6
# Groups:   decay_total, decay_con [196]
   decay_total decay_con density_type nvalues sumlogLik meanlogLik
   <chr>       <chr>     <chr>          <int>     <dbl>      <dbl>
 1 exp01       exp01     BA                 4     -556.      -139.
 2 exp01       exp01     N                  4     -554.      -138.
 3 exp01       exp03     BA                 4     -553.      -138.
 4 exp01       exp03     N                  4     -558.      -140.
 5 exp01       exp05     BA                 4     -554.      -138.
 6 exp01       exp05     N                  4     -561.      -140.
 7 exp01       exp07     BA                 4     -557.      -139.
 8 exp01       exp07     N                  4     -563.      -141.
 9 exp01       exp09     BA                 4     -558.      -140.
10 exp01       exp09     N                  4     -564.      -141.
# ℹ 382 more rows

We then create a heatmap plot Figure 2.4 of summed log likelihoods for all fixed devay (mu) parameter combinations across all species, with the optimal parameter combination marked with an X

Code
# Find optimum value separately for N and BA
  optimum <- sums_total %>%
               group_by(density_type) %>%
               slice_max(sumlogLik)
  
# Plot heatmap of log likelihood values
  ggplot(sums_total, aes(x = decay_total, y = decay_con, 
                         fill = sumlogLik)) +
    geom_tile(width = 0.9, height = 0.9, col = "black") + 
    scale_fill_viridis_c() + # viridis color palette
    geom_label(data = optimum, label = "X") + 
    labs(x = "Decay total density", y = "Decay conspecific density", 
         fill = "sumlogLik") + 
    facet_wrap(~density_type, ncol = 1) + 
    theme_bw(12) + 
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, 
                                     hjust=1))

Figure 2.4: Heatmap of optimal values for decay constants

For this data set, the following are the optimal decay parameter values across all species separately for neighborhood density calculated by abundance (N) and by basal area (BA)

Code
optimum
# A tibble: 2 × 6
# Groups:   density_type [2]
  decay_total decay_con density_type nvalues sumlogLik meanlogLik
  <chr>       <chr>     <chr>          <int>     <dbl>      <dbl>
1 exp19       exp25     BA                 4     -542.      -135.
2 exp03       exp01     N                  4     -542.      -135.

2.10 Selecting optimum decay parameter values separately for each species

Alternatively, it may be useful to determine optimum decay parameters on a species by species basis. To do this, we find the maximum log liklihood for each species separately across decay parameter combination and choose the resulting parameter combination with the highest log likelihood.

Code
sums_total_by_spp <- sums %>% 
              filter(!sp %in% incomplete) %>%
              group_by(decay_total, decay_con, density_type, sp) %>%
              summarise(nvalues = n(),
                        sumlogLik = sum(logLik),
                        meanlogLik = mean(logLik)) %>%
              arrange(decay_total, decay_con, density_type)

sums_total_by_spp
# A tibble: 1,568 × 7
# Groups:   decay_total, decay_con, density_type [392]
   decay_total decay_con density_type sp     nvalues sumlogLik meanlogLik
   <chr>       <chr>     <chr>        <chr>    <int>     <dbl>      <dbl>
 1 exp01       exp01     BA           cappfr       1    -16.6      -16.6 
 2 exp01       exp01     BA           des2pa       1   -489.      -489.  
 3 exp01       exp01     BA           micone       1    -43.7      -43.7 
 4 exp01       exp01     BA           stylst       1     -6.84      -6.84
 5 exp01       exp01     N            cappfr       1    -14.3      -14.3 
 6 exp01       exp01     N            des2pa       1   -489.      -489.  
 7 exp01       exp01     N            micone       1    -41.3      -41.3 
 8 exp01       exp01     N            stylst       1     -8.64      -8.64
 9 exp01       exp03     BA           cappfr       1    -20.7      -20.7 
10 exp01       exp03     BA           des2pa       1   -485.      -485.  
# ℹ 1,558 more rows

We create a heatmap plot Figure 2.5 of log likelihoods for all fixed decay parameter combinations for each species separately, with the optimal parameter combination marked with an X. We only display the first three species here, because with data sets containing many species, it’s hard to visualize all the species on one graph and you may wish to subdivide the plot further for visualization.

Code
  sums <- sums %>%
    filter(sp %in% c("cappfr", "cordbi", "des2pa")) %>%
    group_by(sp) %>%
    # Scale loglikihood by species to help with visualization
    mutate(logLik_scaled = scale(logLik)) %>% 
    ungroup()
 
# Find optimum value separately for N and BA
  optimum_by_sp <- sums %>%
               group_by(sp, density_type) %>%
               slice_max(logLik_scaled, with_ties = FALSE)

# Plot heatmap of log likelihood values
  ggplot(sums, aes(x = decay_total, y = decay_con,
                   fill = logLik_scaled)) +
    geom_tile(width = 0.9, height = 0.9, col = "black") +
    geom_label(data = optimum_by_sp, label = "X") +
    labs(x = "Decay total density", y = "Decay conspecific density", 
         fill = "logLik") +
    facet_wrap(~sp + density_type, ncol = 2, scales = "free") +
    scale_fill_viridis_c() + # viridis color palette
    theme_bw(12) + 
    theme(legend.position = "right") + 
    labs(fill = "Log likelihood\n(scaled)") + 
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, 
                                     hjust=1))

Figure 2.5: Heatmap of optimal values for decay constants separately for each species

For this data set, the following are the optimal decay parameter values for each species separately for neighborhood density calculated by abundance (N) and by basal area (BA):

Code
optimum_by_sp
# A tibble: 6 × 15
# Groups:   sp, density_type [6]
  run_name sp    decay_total decay_con density_type    df    logLik   AIC    BIC
  <chr>    <chr> <chr>       <chr>     <chr>        <dbl>     <dbl> <dbl>  <dbl>
1 cappfr_… capp… exp19       exp25     BA           11.1  -8.74e+ 0  43.3   90.6
2 cappfr_… capp… exp03       exp01     N            11.0  -1.18e+ 1  49.6   97.4
3 cordbi_… cord… exp05       exp03     BA            7.61 -1.25e+ 1  43.1   58.9
4 cordbi_… cord… exp01       exp07     N            14.8  -2.61e-12  31.8   59.5
5 des2pa_… des2… exp25       exp09     BA            8.36 -4.82e+ 2 984.  1037. 
6 des2pa_… des2… exp03       nodecay   N             7.33 -4.82e+ 2 981.  1026. 
# ℹ 6 more variables: deviance <dbl>, df.residual <dbl>, nobs <int>,
#   adj.r.squared <dbl>, npar <int>, logLik_scaled <dbl[,1]>